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Abstract 

Cognitive radio (CR) systems allow opportunistic, secondary users (SUs) to access portions of the spectrum that 
are unused by the network’s licensed primary users (PUs), provided that the induced interference does not compromise 
the PU’ performance guarantees. To account for interference constraints of this type, we consider a flexible spectrum 
access pricing scheme that charges SUs based on the interference that they cause to the system’s PUs (individually, 
globally, or both), and we examine how SUs can maximize their achievable transmission rate in this setting. We show 
that the resulting non-cooperative game admits a unique Nash equilibrium under very mild assumptions on the pricing 
mechanism employed by the network operator, and under both static and ergodic (fast-fading) channel conditions. In 
addition, we derive a dynamic power allocation policy that converges to equilibrium within a few iterations (even for 
large numbers of users), and which relies only on local signal-to-interference-and-noise ratio (SINR) measurements; 
importantly, the proposed algorithm retains its convergence properties even in the ergodic channel regime, despite 
the inherent stochasticity thereof. Our theoretical analysis is complemented by extensive numerical simulations which 
illustrate the performance and scalability properties of the proposed pricing scheme under realistic network conditions. 

Index Terms 

Cognitive radio; multi-carrier systems; interference temperature; pricing; exponential learning. 

I. Introduction 

Greatly raising the bar from previous generation upgrades, current design specifications for 5th generation (5G) 
wireless systems target a massive increase in network capacity, fiber-like connection speeds (well into the Gb/s range), 
and an immersive overall user experience with zero effective latency and response times [1, 2]. As such, the ICT 
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industry is faced with a formidable challenge: these ambitious design goals require the deployment of new wireless 
interfaces at an unprecedented scale, but the necessary overhaul is limited by the inherent constraints of upgrading an 
entrenched (and often ageing) wireless infrastructure. 

Chief among these concerns is the projected spectrum crunch: if not properly managed, the existing radio spectrum 
will not be able to accommodate the soaring demand for wireless broadband and the ever-growing volume of data 
traffic [3]. To make matters worse, studies by the US Federal Communications Commission (FCC) and the National 
Telecommunications and Information Administration (NT1A) have shown that this vital commodity is effectively 
squandered through underutilization and inefficient use: for instance, only 15% to 85% of the licensed radio spectrum 
is used on average, leaving ample spectral voids that could be exploited via efficient spectrum management techniques 
[3, 4]. Accordingly, in this often unregulated context, the emerging paradigm of cognitive radio (CR) has attracted 
considerable interest as a promising way out of the spectrum gridlock [5-8]. 

At its most basic level, cognitive radio comprises a two-level hierarchy between wireless users induced by spectrum 
licencing: the network’s licensed, primary users (PUs) have purchased spectrum rights from the network operator 
(often in the form of contractual quality of service (QoS) guarantees), but they allow unlicenced secondary users (SUs) 
to access the spectrum provided that the induced co-channel interference (CC1) remains below a certain threshold 
[5, 7]. Put differently, by sensing the wireless medium, the network’s cognitive SUs essentially free-ride on the PUs’ 
licensed spectrum and they try to communicate under the constraints imposed by the PUs (though, of course, without 
any QoS guarantees). Thus, by opening up the unused part of the spectrum to opportunistic user access, overall 
spectrum utilization is increased without needing to deploy more (and more expensive) wireless interfaces [6, 9], 

Of course, given the non-cooperative nature of this opportunistic framework, throughput optimization in CR envi¬ 
ronments calls for flexible and decentralized optimization policies with minimal information exchange between SUs, 
PUs, and access points/base stations. In particular, a major challenge involves safeguarding the performance guarantees 
that the network’s licensed primary users have already paid for: if secondary users are allowed to transmit without 
some power/interference control mechanism in place, then the primary users’ QoS requirements may be violated, thus 
invalidating the fundamental operational premise of CR systems. To that end, the authors of [10-13] investigated the 
role of pricing as an effective mechanism to control interference and they provided an energy/cost-efficient formulation 
of the problem where users seek to maximize their transmission rate while keeping their transmit power in check. To 
reach a stable equilibrium state in this setting, several distributed approaches have been proposed, based chiefly on 
reaction functions [10], Gauss-Seidel and Jacobi update algorithms [12], or learning methods [13, 14]; however, these 
works do not distinguish between licenced and unlicensed users, so their results do not immediately apply to CR 
networks. 

In CR systems, PU requirements are often treated as interference temperature (IT) [15] constraints that are coupled 
across the network’s SUs, and the theoretical analysis of the resulting system aims to characterize the network’s 
optimum/unilaterally stable equilibrium states and to provide the means to converge to such states [16-20], These 
constraints are then enforced indirectly via exogenous pricing mechanisms that charge SUs based on the aggregate 
interference that they cause to the network’s PUs (and, of course, PUs are reimbursed commensurately). In this context. 
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the authors of [16] introduced a spectrum-trading mechanism based on a market-equilibrium approach [21] and they 
provided an algorithm allowing SUs to estimate spectrum prices and adjust their spectrum demands accordingly. More 
recently, to account for the PUs’ maximum interference tolerance, the authors of [18, 19] introduced a game-theoretic 
formulation of CR interference channels where SUs are charged proportionally to the aggregate interference caused; 
then, using variational inequality methodologies, they derived sufficient low-interference conditions under which the 
resulting game admits a unique Nash equilibrium and they proposed a best-response algorithm that converges to this 
equilibrium state. The case of inexact system information was considered in [20] where the authors formulated the 
problem as a (deterministic) robust optimization program which can be solved by Lagrangian dual decomposition 
methods. A game-theoretic account of the impact of IT constraints on system performance is also studied in [22] 
where the authors derive cost-aware optimal power allocation policies by relaxing the problem’s hard IT constraints 
and incorporating an exponential cost in the SUs’ utility functions; in this context, the resulting power allocation game 
admits a unique equilibrium which is also Pareto efficient in the low-interference regime. Finally, by exploiting the 
innate hierarchy between primary and secondary users, the authors of [23] provided a Stackelberg game formulation 
where the system’s PU acts as the leader and seeks to maximize the revenue generated by discriminatory spectrum 
access pricing mechanisms imposed to SUs (the game’s followers). 

That being said, the above works focus almost exclusively on wireless systems with static channel conditions where 
the benefits of interference control mechanisms are relatively easy to evaluate; by contrast, very little is known in the 
case where the channels vary with time (e.g., due to user mobility). In the presence of (fast) fading, channel gains are 
typically assumed to follow a stationary ergodic process, so the users’ throughput and induced interference depend 
crucially on the channel statistics. In this stochastic framework, the authors of [14] studied the problem of ergodic rate 
maximization in multi-carrier (MC) systems and derived an efficient power allocation algorithm that allows users to 
attain the system’s capacity; however, no distinction was made between licensed and unlicensed users, so the results 
of [14] do not readily translate to a CR setting. More recently, [24] provided an efficient online learning algorithm 
for unilateral rate optimization in dynamic multi-carrier multiple-input and multiple-output (MIMO) cognitive radio 
systems, but, again, without taking into account any IT constraints imposed by the network’s primary users. 

In this paper, we consider the problem of cost-efficient throughput maximization in multi-carrier cognitive radio 
networks where SUs are charged based on the interference that they cause to the network’s PUs (either on an aggre¬ 
gate or a per-user basis). Our system model is presented in Section II where we consider a general game-theoretic 
formulation that is flexible enough to account for both aggregate (flat-rate), temperature-based, and per-user pricing 
schemes. In the case of static channels (Section III), we show that the resulting game admits a unique Nash equilibrium 
almost surely, provided that the SUs’ pricing schemes satisfy some fairly mild requirements (for instance, that a user’s 
transmission cost increases with his radiated power). On the other hand, in the case of fast-fading channels (which 
we study in Section IV), we show that the game under study admits a unique Nash equilibrium always, without any 
further caveats. 

Moreover, extending the exponential learning techniques of [14], we also derive a dynamic power allocation policy 
that converges to Nash equilibrium in a few iterations, even for large numbers of users and/or subcarriers per user. In 
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particular, the proposed algorithm has the following desirable attributes: 

1) Distributedness: user updates are based on local information and signal measurements. 

2) Statelessness: users do not need to know the state (or topology) of the system. 

3) Unilateral reinforcement: each user tends to increase his own utility; put differently, the algorithm is aligned with 
each user’s individual objective. 

4) Flexibility: the users’ learning algorithm can be deployed in both static and ergodic (fast-fading) channel envi¬ 
ronments. 

As such, even though the static and ergodic channel regimes are fundamentally different, the network’s users do 
not have to switch their update structure in order to converge to equilibrium (in the static or fast-fading regime, 
respectively). 

Finally, our analysis is supplemented in Section V by extensive numerical simulations where we illustrate the 
throughput and power gains of the proposed approach under realistic conditions. 

II. System Model 

Consider a set X = {1,... ,K] of (unlicensed) secondary users (SUs) that seek to connect to a common receiver 
over a set S = {1,..., .S'} of non-interfering subcarriers (typically in the frequency domain if an orthogonal frequency 
division multiplexing (OFDM) scheme is employed). Focusing on the uplink case, the aggregate received signal y s 
over the s-th subcarrier will then be: 

}’.s = Xfcac hki ' Xks + Zs ’ (!) 

where 

1) Xks e C denotes the transmitted signal of user ke X over the s-th subcarrier. 

2) h) :s e € is the corresponding transfer coefficient. 

3) z s 6 C denotes the aggregate interference-plus-noise received from all sources not in X (including the aggregate 
PU transmission on subcarrier s plus ambient and other peripheral interference effects); throughout this paper (and 
by performing a suitable change of basis if necessary), we will model z s as a Gaussian variable z s ~ 624(0, cr 2 ) 
for some positive cr s > 0. 

In this context, the average transmit power of user k on subcarrier ,v will be 

p ks = E [|x fa | 2 ], (2) 

where the expectation is taken over the (Gaussian) codebook of user k; furthermore, each user’s total transmit power 
pu = E[xjxj;] = 2 j Pks will h ave to satisfy the power constraint 

Pk = Yj s es Pks ~ Pk ’ ( 3 ) 

where P k > 0 denotes the maximum transmit power of user k e X. In this way, the set of admissible power allocation 
vectors for user k is the S -dimensional polytope 

%k = {p/t € R s : Pks > 0 and H,<=s Pks < Ft}, 


(4) 
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and the system’s state space (i.e., the space of all admissible power allocation profiles p = (pi,..., p^)) will be the 
product X = n* X,. 


In this multi-carrier (MC) framework, each user’s achievable transmission rate depends on his individual signal-to- 
interference-and-noise ratio (SINR) 


sinr^(p) = 


SksPks 


(5) 


0 "? + TimgfsPts 

where g ks = \h ks \ 2 denotes the channel gain coefficient for user k over the .v-th subcarrier. Thus, in the single user 
decoding (SUD) regime (where interference by all other users is treated as additive noise), the maximum information 
transmission rate for user k (achievable with random Gaussian codes) will be: 


Rki P) 


X„ s 


log (1 + sinr^ 5 (p)) = 


log (cr 2 + W S ( P)) - log (cr 2 + Yj m gCsPtsj 


( 6 ) 


where 

vu(p) = Yjk 8ksPks ’ s = 1 ’'' ■ ’ S ’ ( 7 ) 

denotes the aggregate SU interference level per subcarrier (for convenience we will also write w = (wi,..., wj) for 
the SUs’ aggregate interference profile over all subcarriers s e S). 

In the absence of other considerations, the unilateral objective of each SU would be the maximization of his 
individual transmission rate R k ( p) subject to the total power constraint (3). In our CR setting however, the network 
operator needs to ensure that the system’s PUs meet the QoS guarantees that they have already paid for - typically 
in the form of minimum rate requirements or maximum interference tolerance per subcarrier. Thus, to achieve this, 
we will consider a general spectrum access pricing scheme whereby SUs are charged according to the individual and 
aggregate interference that they cause to the network’s PUs. 

Formally, this can be captured by the general cost model: 


C k { p) = 7ro(w(p)) + 7T*(p*), 


( 8 ) 


where: 

1) 7Tq : [R‘ s —> ]R + is aflat spectrum access price that is calculated in terms of the aggregate SU interference level w s 
per subcarrier s e S. 

2) 7t k : 3C* —» IR + is a user-specific price which is charged to user k 6 X based on his individual radiated power 
profile pi e X k . 

In tune with standard economic considerations on diminishing returns [21], the only assumptions that we will make 
for the price functions no and n k are that: 

(Al) Every price function n is non-decreasing in each of its arguments. 

(A2) Every price function n is Lipschitz continuous and convex. 

In particular, the convexity assumption (A2) acts as an interference control mechanism for the system: by charging SUs 
higher spectrum access prices for the same increase in interference when the network operates in a high-interference 
state, SUs are implicitly encouraged to transmit at lower powers, thus creating less co-channel interference (CCI) to 
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the network’s SUs. In this way, the pricing scheme (8) is flexible enough to account for very diverse pricing paradigms: 
if 7To = 0, the network’s SUs are charged on an equitable user-by-user basis, based only on the individual interference 
that each individual user induces to the network’s PUs; 1 otherwise, if n k = 0, the pricing model (8) allows the network 
operator to reimburse infractions to the PUs’ contractual QoS guarantees by imposing an aggregate “sanction” to the 
network’s SUs (who were responsible for causing the violation in the first place). 

The specifics of the pricing functions no and jik are negotiated between network users and operators based on their 
needs and means, so they can vary widely depending on the context - see e.g. [10, 13, 18, 22, 23], For concreteness, 
we provide below some typical examples of pricing models which we explore further in Section V: 2 
Model 1. Let 7™ ax denote the PUs’ interference tolerance on subcarrier ,v. Then, in the spirit of [18], we define the 
linear pricing (LP) flat-rate model as: 

4 P (w) = A 0 £ se§ w s //r, (LP) 

where the pricing parameter To represents the price paid by the network’s SUs when saturating the PUs’ 
interference tolerance. In words, SUs are charged a flat-rate which is proportional to the degree of saturation 
of the PUs’ interference tolerance level, so the model (LP) treats the PUs’ requirements as a soft constraint. 
Model 2. With notation as above, the violation pricing (VP) flat-rate model is defined as: 

*o V)='to 2 s£S K//r -1] + (vp) 

where To > 0 is a sensitivity parameter and [x]+ = maxjx, 0). In this model, SUs are only charged when 
the PUs’ interference tolerance is actually violated, and the steepness of the sanction is controlled by the 
pricing parameter To; as such, in the large To limit, (VP) treats the PUs’ requirements as a hard constraint 
with very sharp violation costs. 

In light of all this, the utility of user k is defined as: 

Uk( P) = MP) - C*(p), (9) 

i.e., «/Tp) is simply the user’s achieved transmission rate minus the cost reimbursed to the network operator in order to 
achieve it. In turn, this leads to the cost-efficient throughput maximization game 6 = ©(3C, X, n), defined as follows: 

1) The game’s players are the system’s secondary users k e X — [1,..., K). 

2) The action set of each player/user is the set of feasible power allocation profiles X* = [p*. e R s : p ks > 
0 and 2, se s Pks <Pk}- 

3) Each player’s utility function u k : X = [”[& X* —> R is given by (9). 

In this context, we will say that a power allocation profile p* e X is at Nash equilibrium (NE) when 

Uk(p* k ;p*_ k ) > Uk(pp, p*_ k ) for all p^. e X* and for all k e X, (NE) 

'Likewise, n k could also account for the actual cost incurred by the user to recharge the battery of his wireless device as in [13]. 

2 For simplicity, we focus on the fiat-rate case; the corresponding user-specific price functions are defined similarly. 
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i.e., when each user’s chosen power profile p* k e Xk is individually cost-ejficient given the power profile of his 
opponents (so no user has a unilateral incentive to deviate). Accordingly, our goal in the rest of the paper will be 
to characterize the Nash equilibria of © and to provide distributed optimization methods allowing selfish (and myopic) 
SUs to converge to equilibrium in the absence of centralized medium access control mechanisms. 

III. Equilibrium Analysis, Learning and Convergence 

In this section, we focus on the characterization of the NE of the cost-efficient rate maximization game © and on 
how players can attain such a state by means of a simple, adaptive learning process. 

A. Equilibrium structure and characterization 

A key property of the rate maximization game © is that it admits a potential function [25]: 

Proposition 1. Let w be the aggregate SU interference level defined as in (7). Then, the function 

V(p) = Xi, 5 l0g ( Cr " + Ws) - 7To(w) - .^(P*) (10) 

is an exact potential for the cost-efficient rate maximization game ©; specifically: 

i<k(Pk\ p-*) - u k (p' k \ p- k ) = V(p*; p-t) - V(p*; p_*) (11) 

for all p^, pj. 6 X k and for all p_i e X„i = [“f^. %k- 

Proof: By inspection. ■ 

Since the price functions no and n k are convex, the potential function V is itself concave (though not necessarily 
strictly so; see below). By Proposition 1, it then follows that maximizers of V are NE of © (so the Nash set of 0 is 
nonempty); furthermore, with V concave in p and u k concave in p*, every NE of © is also a maximizer of V. In this 
way, finding the equilibria of © boils down to the nonlinear optimization problem: 

maximize L(p), 

(12) 

subject to pks > 0, Z, Pks — Pk ■ 

Thanks to this formulation, we obtain the following equilibrium uniqueness result for ©: 

Theorem 1. Assume that: 

(Cl) Each user-specific price function n k is strictly increasing in each of its arguments, 
or: 

(C2) The flat spectrum access price function 7 Tq is either gentle enough or steep enough : 0 < < (cr 2 s + ^ gk s P k ) 

or > 1 /cr 2 for all w s and for all subcarriers s e §. 

Then, the cost-efficient throughput maximization game © admits a unique Nash equilibrium for almost all realizations 
of the channel gain coefficients g ks • More generally, even if both (Cl) and (C2) fail to hold, the set of Nash equilibria 
of © is a convex polytope of dimension at most S(K - 1). 


Proof: See Appendix A. 


Remark 1. The “almost all” part of the statement of Theorem 1 should be interpreted with respect to Lebesgue measure 
- i.e., uniqueness holds except for a set of price functions and channel gain coefficients of Lebesgue measure zero. In 
particular, if channel gains are drawn at the outset of the game following some fixed, continuous probability distribution 
(e.g., induced by the SUs’ spatial distribution), then this means that © admits a unique equilibrium with probability 1. 


B. Exponential learning and convergence to equilibrium 

The equilibrium characterization of Theorem 1 is crucial from the standpoint of dynamic spectrum management 
(DSM) because it guarantees a very robust solution set (a convex polytope); in fact, as we just saw, the game’s 
equilibrium set is a singleton under fairly mild conditions for the users’ price functions (e.g., that the user-specific 
price functions Jtk be strictly increasing). Regardless, given that it is far from clear how the system’s users can compute 
the solution of the problem (NE), our goal in this section will be to provide a distributed learning mechanism that can 
be employed by the system’s users in order to reach a Nash equilibrium. 

Our proposed algorithm will rely on the users’ marginal utilities: 

v*(p) = VkUk(p) (13) 


where denotes differentiation with respect to the power profile p; of user L In particular, writing v* = (v^i,..., v^s), 
some easy algebra yields the component-wise expression 


, , du ks 

Vfa(p) = -- = gks 

OPks 


l 


dtt 0 

dw s 


dn k 
dpks ’ 


(14) 


\(Tj, + W s 

which shows that vt s (p) can be calculated by each individual user knowing only their SINR per subcarrier (which is 
measured locally) and the functional form of the price functions 7to and n k (which are agreed upon by the network’s 
SUs and the PU and are thus also known locally). Indeed, Eq. (5) shows that the aggregate interference level on 
subcarrier s can be calculated by user k as: 


w 


(P) — /* '. SksPks EksPks ^ . . StsPfs SksPks 

L—ik L — it^k s 


SksPks 


‘ SksPks~ 


1 + sinr fo (p) 


(15) 


sinr^(p) sinr* s (p) 

i.e., requiring only local SINR measurements and the knowledge of the user’s channel (which can in turn be obtained 
through the exchange of pilot signals). As a result, the marginal utility vectors v k can be calculated in a completely 
distributed fashion with locally available information. 

By definition, the users’ marginal utility vectors define the direction of unilaterally steepest utility ascent, i.e., the 
best direction that a user could follow in order to increase his utility. As such, a natural learning process would be for 
each user to track this steepest ascent direction with the hopes of converging to a Nash equilibrium; however, given 
the problem’s power and positivity constraints, this method may quickly lead to inadmissible power profiles that do 
not lie in X - in which case convergence is also out of the question. 

To account for these constraints, we will employ an interior point method which increases power on subcarriers 
that seem to be performing well, without ever shutting off a particular channel completely. Formally, consider the 
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exponential regularization map G: R s —♦ R® given by 


G(v) = 


1 


■ (exp(v!),..., exp(v s )). 


(16) 


1 + 2.? exp(vj) 

This map has the property that it assigns positive weight (power) to all subcarriers and exponentially more weight 
to the subcarriers s e § with the highest marginal utilities v v . Furthermore, if all marginal utilities are relatively low 
(indicating high transmission costs), all assigned weights will also be low in order to decrease the user’s cost. With 
this in mind, our proposed exponential learning algorithm for cost-efficient rate maximization is as follows: 


Algorithm 1 Exponential Learning for Cost-Efficient Rate Maximization 
Parameter: step size y n . 

Initialize: n <— 0; scores y ks <— 0 for all k e X, s 6 S. 

Repeat 


n <— n + 1; 

foreach user k e X do 

foreach subcarrier *6§do 
set transmit power p ks <- P k 
measure sinr tl ; 
update marginal utilities: v ks 


expfyts) 


1 + Zr exp (y kr ) ’ 

1 sinr* s 


Pu 1 + sinr^ 


update scores: y ks <- y ks + y n v ks ; 
_ until termination criterion is reached. 


0C k . 
dpks' 


From an implementation point of view. Algorithm 1 has the following desirable properties: 

(PI) It is distributed: users only need local or publicly available information in order to run it. 

(P2) It is stateless: users do not need to know the state of the system (e.g., its topology). 

(P3) It is reinforcing: users tend to allocate more power to cost-efficient subcarriers. 

We then obtain: 

Theorem 2. Let y n be a variable step-size sequence such that y„ = oo and YIj=i Tjl Tj')=\ 7j 0- Then, Algorithm 
1 converges to Nash equilibrium in the cost-efficient rate maximization game ©. 

Proof: See Appendix B. ■ 

Remark. The condition 2"_ 1 y 1 ./ £" =1 y ; - —> 0 requires the use of a decreasing step-size y n (which slows down the 
algorithm), but the rate of decay of y n can be arbitrarily slow - in stark contrast to the much more stringent requirement 
y 2 . < oo that is common in the theory of stochastic approximation [26]. As such. Algorithm 1 can be used with an 
effectively constant (very slowly varying) step-size, and still converge to equilibrium; we explore this issue in detail in 
Section V. 
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IV. Fast-Fading and User Mobility 


Our analysis so far has focused on static channels, corresponding to wireless users with little or no mobility. In 
this section, we investigate the case of mobile users where the channel gain coefficients evolve over time following a 
stationary ergodic process. 

In this fast-fading regime, the users’ achievable rate is given by the ergodic average [27]: 


M p) = E s R k ( p) = V Eg log (1 + 


Z—JseS 

leading to the corresponding average utility functions: 


gksPks 


+ Zf ±kg(sPts) 


(17) 


M*(P) = R k ( p) - Eg[Ci(p)] = R k ( p) - Eg [tto(w) + n k ( p*)], 


( 18 ) 


where the expectation Eg[-] is taken with respect to the law of the channel gain coefficients g ks = \li ks \ 2 (recall here 
that the aggregate multi-user interference-plus-noise (MUI) per subcarrier w s = Zfene gksPks depends itself on the 
realization of the channels). We thus obtain the following game-theoretic formulation of cost-efficient throughput 
maximization in the presence of fast fading: 


maximize w/.(p/ : ;p k ) (unilaterally for all k e %), 
subject to pt e X k . 

As in the static regime, we then obtain the following characterization of Nash equilibria: 


(19) 


Proposition 2. With notation as above, let 

F(P) = £,£.108(0? + Ws ) ~ Eg + ^^(P*)] • (20) 

Then, V(p) is an exact potential for the ergodic rate maximization game © s ©(3C, X, it). In particular, if the channels ’ 
law is atom-free (i.e., it is absolutely continuous with respect to Lebesgue measure), V is strictly concave and © admits 
a unique Nash equilibrium. 


Proof: See Appendix C. ■ 

Proposition 2 shows that the inherent stochasticity in the users’ channels actually helps in guaranteeing a very 
robust solution set for the cost-efficient throughput maximization problem (19) (see also [14] for a related result in 
the context of rate control). On the other hand, the expectation over the users’ channels is typically hard to carry 
out (especially beyond the Gaussian i.i.d. regime), so it is not clear how to calculate the ergodic marginal utilities 
Vjt(p) = V/,M/4p) = Eg[v(p)]. Thus, instead of trying to reach a Nash equilibrium by employing a variant of Alg. 1 
run with the users’ ergodic marginal utilities (whose calculation requires considerable computation capabilities and a 
good deal of knowledge on the channels’ statistics), we will consider the same sequence of events as in the case of 
static channels: 

1) At every update period n = 1,2,..., each user k e X calculates his instantaneous marginal utility vector \ k (n) 
following (14): 

1 sin r ks (n) _ 5Q 
Pks(n) 1 + sinr ks (n) p ks 


v ks (n) = 


P in ) 


( 21 ) 
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TABLE I 

Simulation Setting 


Parameter 

Value 

Carrier frequency 

f c = 2.4 GHz 

Channel bandwidth 

B = 10.93 KHz 

Noise spectral density 

<t s = —173 dBm/Hz 

Maximum transmitting power of SUs 

P k = 21.03 dBm 

Edge of the simulated square area 

L = 200 m 

Transmitting power of the PU 

P pu = 30 dBm 

Distance of the PU from the receiver 

d = 50 m 


TABLE II 
PU’s Requirements 


Data Rate 

jmax 

12.8 KHz 

-68.3 dBm 

16 KHz 

-70 dBm 

32 KHz 

-75.6 dBm 


2) Users update their powers following the recursion step of Alg. 1, and the process repeats. 

Remarkably, despite the inherent stochasticity, we have: 

Theorem 3. Assume that the variance of the users’ channel gain coefficients is finite. If Alg. 1 is run with step-sizes 
y n such that f n 7n = 00 and Yl)=i jV YI)=\ 7j ~ > 0, then the users’ power profiles converge to Nash equilibrium in the 
cost-efficient ergodic rate maximization game 0 (a.s.). 

Proof: See Appendix C. ■ 

Remark 2. Thanks to Theorem 3, we see that Algorithm 1 enjoys the additional property: 

(P4) Flexibility: users can apply the algorithm “as-is” in both static and fast-fading environments. 

V. Numerical Results 

To evaluate the performance of the proposed cost-efficient power allocation framework for throughput maximiza¬ 
tion in cognitive radio networks, we have performed extensive numerical simulations over a wide range of system 
parameters. In what follows, we provide a selection of the most representative cases. 

Throughout this section, and unless explicitly mentioned otherwise, we consider a population of K = 10 SUs 
uniformly distributed over a square area and S = 10 non-interfering subcarriers with channel gain coefficients gk s 
drawn according to the path-loss model for Jakes fading proposed in [28]; the other relevant simulation parameters are 
summarized in Table I. For simplicity, we also assume that cr s and P k are equal for all s e § and all k e 3C; finally, we 
will assume that PUs have the same interference tolerance level 7™ ax over all subcarriers s e S. 

To begin with, we evaluate the impact of interference pricing on the SUs’ behavior by introducing the violation 
index 

A> s = w s /I maK , (22) 

i.e., the amount of interference generated by SUs on the ,v-th subcarrier relative to the PUs’ tolerance. Obviously, 
't' s < 1 means that the system’s interference temperature (IT) requirements are not violated, whereas V F V > 1 indicates 
a violation of the PUs’ contractual QoS guarantees that will have to be reimbursed by the network’s SUs. Accordingly, 
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in Fig. 1, we plot the system’s average violation index 'F = 1/|S| 2 seS Tf as a function of the pricing parameter do for 
different values of the maximum interference tolerance level 7 max under the flat-rate pricing scheme 7to(w). As can be 
seen, if the PUs’ maximum interference tolerance level is low (i.e., 7 max is small), SUs violate the resulting interference 
temperature constraint only if the value of the price parameter do is also low. Thus, the PUs’ QoS guarantees are 
violated only in the “soft pricing” regime where the pricing parameter do is not high enough to safeguard the PUs’ 
low interference tolerance. On the other hand, if the cost incurred due to violations is high enough, no violations are 
performed: our simulations show that under both the LP and VP models, there exists a threshold value of the cost 
parameter do such that the violation index at the game’s NE is always less than one, i.e., the interference generated by 
SUs on each subcarrier is never higher than the PUs’ IT constraints. 

That being said, increasing the flat-rate pricing parameter do can lead to significantly different SU behavior with 
respect to the PUs’ interference tolerance level. 3 In fact, under the LP pricing model, SU interference disincentives 
can become excessive: Fig. 1 shows that transmission costs for high do are so high (even for low interference levels) 
that SUs prefer to shut down and stop transmitting altogether. On the other hand, under the VP model, do affects the 
outcome of the game only if the PUs’ maximum interference tolerance is low: increasing do beyond a certain value 
does not lead SUs to shut down and does not impact their sum-rate at equilibrium, precisely because SUs are charged 
only if they cause excessive interference to the system’s primary users. 

To illustrate the system’s transient phase when users employ Algorithm 1 to optimize their utility. Fig. 2 shows the 
aggregate interference on a given subcarrier when the interference constraint is set to 7 max = -70 dBm and users are 
charged based on the VP flat-rate model. We see there that the PU’s interference constraint is violated only during 
the first few iterations of the learning process: when the interference in a given subcarrier exceeds the PUs’ tolerance, 
the SUs experience a sharp drop in their marginal utilities (14) because of the incurred cost 7r^ p (w), so Algorithm 1 
prompts them to reduce their radiated power in the next iteration in order to avoid further violations. In this way, SU 
violations are quickly reduced and the users’ learning process converges to a violation-free Nash equilibrium of the 
cost-efficient throughput maximization game. 

In Fig. 3, we evaluate the impact of pricing and power constraints on the system’s performance at Nash equilibrium 
for different pricing models. Under the VP model, the SUs’ sum-rate at equilibrium is affected by the cost parameter 
To only when To is small: the reason for this is that SUs do not violate the PUs’ interference temperature constraints 
for high To (cf. Fig. 1), so their transmit power and sum-rate at equilibrium remains (almost) constant for high To. On 
the other hand, as in the case of Fig. 1, Fig. 3 shows that the LP model (solid lines) is strongly affected by the pricing 
parameter To, for all To values: since increasing To in the LP model increases transmission costs across the board, each 
SU is pushed to reduce his individual transmit power in order to reduce the induced mutual interference in the network 
commensurately. It is worth noting however that increasing transmission costs is not always detrimental to SUs under 
the LP model: as shown in Fig. 3, there is a pricing parameter region where the overall interference on a given channel 

^Recall here that, under VP, the system’s SUs are not charged when their aggregate interference w s is lower than / max , and are (steeply) fined 
otherwise; by contrast, the LP model charges users even when the system’s IT constraints are not violated. 
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Fig. 1. Violation index as a function of do for different values of the 
maximum interference temperature level 7 max under the flat-rate pricing 
schemes (LP: solid lines; VP: dashed lines). 


Fig. 2. Impact of the interference constraint on the evolution of the 
learning process under the LP model, (7( nax = -70 dBm). 


decreases when do is increased, thus enabling users to achieve higher data rates (due to the decreased interference on 
the channel). Nonetheless, in the presence of much higher transmission costs, the radiated power of SUs is too low to 
carry any significant amount of information, thus leading to a decrease in achievable throughput. 

We also show the impact of different system configurations on the achievable SU performance by plotting the users’ 
average sum-rate at equilibrium for different values of the system’s congestion index, i.e., the ratio K/S between 
the number of SUs accessing the system and the number of available subcarriers. As expected, networks with low 
congestion (i.e., K/S = 0.5,1) exhibit better performance than highly congested networks (i.e., K/S = 1.5): when 
there is a higher number of SUs trying to access the network, the mutual interference also increases, thus causing 
considerable losses in throughput and leading SUs to shut down instead of incurring high transmission costs for 
moderate-to-low gains in throughput. 

In Fig. 4 we illustrate how the SUs’ sum-rate at equilibrium varies as a function of the PUs’ interference tolerance 
7 max for different pricing schemes (linear vs. violation pricing and flat-rate vs. per-user pricing). Obviously, when SU 
transmission comes at no cost (the do = 0 case), the value of / max does not impact the outcome of the game. On the 
other hand, when do > 0, the SUs’ average sum-rate increases as the PUs’ interference tolerance increases up to a 
critical value 7™ ax where the SUs’ sum-rate achieves its maximum value. For any tolerance level 7 max > 7™ ax , the SUs’ 
average sum-rate starts decreasing and eventually converges to a well-defined limit value as 7 max —> oo, corresponding 
to the case where the PU is allowing free access to the leased part of the spectrum. This occurrence is similar to 
what we have already discussed in Fig. 3 and stems from the fact that low prices (small do) and/or high interference 
tolerance (large 7 max ) do not provide a strong disincentive for SUs to reduce their power level; as a result, the mutual 
interference across SUs also increases and leads to a decrease in the achievable performance of the secondary network. 
Importantly, when 7 max is relatively low, the LP and VP models exhibit different behaviors, illustrated by the fact that 
the SUs’ sum-rate at equilibrium differs. By contrast, (LP) and (VP) both tend to zero as 7 max —> oo, so their behavior 
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* — K/S=0.5 
O — K/S=1 
1 □ K/S=1.5 



Fig. 3. Average sum-rate as a function of different pricing models, 
system configurations and values of the pricing parameter Ao (LP: solid 
lines; and VP: dashed lines). 



Fig. 4. Average sum-rate as a function of the maximum interference 
/max a t the PU for different pricing schemes and values of the pricing 
parameters /to under the LP model (LP: solid lines; VP: dashed lines). 


for very large 7 max is similar and the system converges to the same sum-rate value. 

The observed sum-rate maximum for intermediate values of 7 max can be explained as follows: in the intolerant 
regime (small 7 max ), users hardly transmit at all because of the PUs’ strict QoS requirements; on the other hand, in 
the “open network” regime (large 7 max ), each user selfishly transmits at maximum power in order to maximize his 
individual throughput (since there is no cost balancing factor), thus increasing interference and reducing the users’ 
sum-rate (in a manner similar to the classical prisoner’s dilemma). As a result, the SUs’ sum-rate is maximized for an 
intermediate value of 7 max where SUs have to control their power in order to avoid being charged for IT violations: 
in other words, a proper choice of 7 max (or, equivalently. To) allows SUs to achieve a state which is both unilaterally 
stable and Pareto efficient (in the sense described above). 

Finally, in Fig. 4 we also investigate the difference between flat-rate pricing (no) and per-user pricing (7i>) models. 
Both models exhibit similar properties, but for noticeably different values of To: specifically, to achieve the same sum- 
rate under per-user pricing, lower values of To should be considered, because users are much more sensitive to the 
value of To in the per-user paradigm. 

In Fig. 5 we illustrate the transmission rate and revenue achieved by the PU as a function of the pricing parameter 
To for different values of 7 max under the LP and VP schemes. Specifically, the PU’s sum-rate is calculated as 

TWw) = log (l + sinrfV,)) , (23) 

seS 

where sinr PU (w s ) = g m P PV /w s is the PU’s SINR on the s-th subcarrier, and g pu and P pu denote the PU’s channel 
gain and transmit power, respectively; by the same token, the revenue of the PU is simply Kno + Yjk n k* i.e., the sum 
of the charges paid by the SUs. For comparison purposes, we have fixed three different values of the parameter 7 max 
according to different PU minimum data rate requirements (cf. Table II). 
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Fig. 5. Sum-rate and revenue of the PU and total transmitting power of SUs as a function of io for different values of the maximum interference 
temperature level 7 max under different pricing schemes (LP: solid lines; VP: dashed lines; Minimum data rate: dotted lines). 

Importantly, as far as the LP model is concerned. Fig. 5 shows that a high pricing parameter do brings no revenue to 
the PU because it acts as a severe transmission disincentive to the SUs (cf. Fig. 3, where we saw that SUs shut down 
beyond a certain threshold value d*). Because of this behavior, there exists a critical value A'. for the pricing parameter 
that maximizes the PUs’ revenue: the calculation of this critical value lies beyond the scope of this paper, but it is 
evident that A‘ (] increases when the maximum tolerable interference 7 max imposed by PUs also increases. On the other 
hand, the PUs’ revenue under the VP model is almost always zero (or close to zero): the reason for this is that the 
VP model acts as a soft barrier (which hardens in the large do limit), so users tend to respect the PUs’ requirements 
and thus incur no transmission-related penalties. In other words, we see that if the PU’s QoS requirements are not too 
sharp, then the LP model acts as a good source for revenue; otherwise, if the PU’s rate requirements are tight, the VP 
model guarantees that SUs will respect them but does not generate any income. Also, note that under both the LP and 
VP models, the rate of the PU is always equal or higher than his minimum required data rate (dotted lines). This is an 
important result that shows that pricing regulates the SUs’ behavior indirectly (based on the PU’s QoS requirements 
and revenue targets), simply by fine-tuning the exact pricing model and its parameters (e.g., do). 

Figs. 6(a)-6(c) compare the performance of the proposed power allocation scheme to the benchmark case of 
uniform power allocation - i.e., when SUs transmit at full power and allocate their power uniformly over the available 
subcarriers, irrespective of the PU’s requirements. For some values of do, the SUs’ sum-rate under uniform power 
allocation is higher than the one achieved by the proposed approach, but this comes at the expense of violating the 
PU’s minimum QoS requirements (which constitutes a contractual breach from the PU’s perspective); on the contrary, 
our approach always respects the PU’s contractual requirements (since the do pricing parameter is negotiated with 
the PU), while guaranteeing high throughput to the SUs. This is seen in Fig. 6(b): the PU’s throughput exceeds the 
throughput achieved when SUs employ a uniform power allocation policy, except when the PU has no significant 
QoS requirements (/ max -4 oo), in which case the SUs exploit all the available spectrum and the PU’s rate is reduced. 
Furthermore, in Fig. 6(c) we illustrate the normalized revenue of the proposed approach w.r.t. the revenues generated 
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(a) (b) (c) 

Fig. 6. Comparison between the proposed and uniform power allocation approaches: a) Sum-rate of SUs; b) Sum-rate of the PU; c) Normalized 
revenue of the proposed approach w.r.t. the uniform power allocation policy (LP: solid lines with star and circle markers; VP: dashed lines with star 
and circle markers). 


by uniform power allocation policies. Note that the income generated by the proposed approach is up to 3x higher 
than the income generated by SUs that are not cost-/energy-aware and transmit naively at full power, using a uniform 
power allocation policy. 4 Thus, by fine-tuning his pricing scheme, the PU not only achieves his QoS requirements, but 
also increases his monetary revenue against cost-aware SUs. 

In Figs. 7 and 8, we investigate the length of the system’s off-equilibrium phase and the convergence rate of the 
proposed distributed learning scheme (Algorithm 1). By Theorem 2, the iterations of Algorithm 1 converge to Nash 
equilibrium when using a step-size sequence y„ such that jj/ Z"=i 7j ~* 0 as n —> oo. As discussed in [13], a 
rapidly decreasing step-size sequence slows down the algorithm, so we examine here the usage of a fixed step size to 
accelerate convergence. This choice makes the algorithm run faster; on the other hand, a fixed step-size may lead to 
unwanted oscillations around the equilibrium point, thus interfering with the algorithm’s end-state. To account for this, 
we employ an adaptive search-then-converge (STC) approach [29]: we start with a large, constant step-size which is 
then decreased as soon as oscillations are detected. 5 By means of this approach. Algorithm 1 is very aggressive during 
the first non-oscillating iterations and it becomes more conservative (thus guaranteeing convergence) once oscillations 
are noticed. 

To assess the method’s efficiency, we plotted the system’s equilibration level (EQL) defined as: 

EQ L in) = J" ~_ V ” m (24) 

* max * min 

where V n = V(p (n)) is the potential (10) of the game at the n -th iteration of the algorithm, and U m i n (Umax) is 
the minimum (maximum) value of V; obviously, an EQL value of 1 means that the system is at Nash equilibrium. 
Accordingly, in Fig. 7, we show the evolution of the EQL and the system’s sum-rate at each iteration for different step- 

4 Recall here that the VP model does not generate any revenue so, to reduce clutter, the corresponding curves are not shown. 

5 Note that such a step-size schedule still satisfies the summability postulates of Theorem 2. 
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LP VP 



Fig. 7. Equilibration level, EQL (n), for different step-size rules under 
and flat-rate interference pricing models. 


LP VP 



Fig. 8. Scalability of the proposed learning scheme as a function of 
the step-size y for different values of the number K of SUs and pricing 
schemes (io = 0.1: solid lines; Aq = 0.5 dashed lines). 


size rules and interference pricing models. As expected, a conservative step-size of the form y n = n~P, 1/2 < ft < 1, 
leads to relatively slow convergence (of the order of several tens of iterations or worse). On the other hand, the use of 
STC and fixed-step methods greatly accelerates the users’ learning rate: after only a few STC iterations the system’s 
EQL exceeds 90%, and the algorithm’s convergence is accelerated even further by increasing the constant step-size in 
the “exploration” phase of the STC method. 

To investigate the scalability of the proposed learning scheme, we also examine the algorithm’s convergence speed 
for different numbers of SUs. In Fig. 8 we show the number of iterations needed to reach an EQL of 95%: importantly, 
by increasing the value of the algorithm’s step-size, it is possible to reduce the system’s transient phase to a few 
iterations, even for large numbers of users. Moreover, we also note that the algorithm’s convergence speed in the 
LP model depends on the pricing parameter do (it decreases with do), whereas this is no longer the case under the 
VP model. The reason for this is again that the VP model acts as a “barrier” which is only activated when the PUs’ 
interference tolerance is violated. 

Finally, to investigate the impact of mobility and channel fading on the users’ learning process, we consider a system 
with three SUs (K = 3) and three independent and identically distributed (i.i.d.) Gaussian fast-fading orthogonal 
subcarriers (S = 3). In Fig. 9, we plot the system’s EQL with respect to the ergodic potential (20) under the LP model 
as a function of different price settings and step-size rules. Remarkably, even in this stochastic setting. Algorithm 1 
still converges to the game’s NE in a few iterations and, as before, the algorithm’s convergence rate is improved by 
choosing more aggressive step-size sequences. 


VI. Conclusions 

In this paper, we considered a game-theoretic formulation of the problem of cost-efficient throughput maximization 
in multi-carrier CR networks where SUs are charged based on the interference that they cause to the system’s PUs. 
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Fig. 9. Equilibration level (EQL) for different values of the pricing parameter /Iq and step-size rules under the fast-fading regime. 


We showed that the resulting game admits a unique Nash equilibrium under fairly mild conditions (and for both static 
and ergodic channels), and we derived a fully distributed learning algorithm that converges to equilibrium using only 
local SINR and channel measurements (and, again, under both static and fast-fading channel conditions). Our analysis 
shows that the choice of the exact pricing scheme has a strong impact on the network’s achievable performance (for 
both licensed and unlicensed users): in the “soft-pricing” regime, the PUs’ requirements are violated in exchange for 
monetary reimbursement; by contrast, higher prices safeguard the PUs’ requirements, but (somewhat surprisingly) 
generate no revenue to the PUs. Moreover, thanks to the fast convergence of the proposed algorithm, the system’s 
transient (off-equilibrium) phase is minimized, so SUs avoid being unduly uncharged for relatively low throughput 
levels. 

Some important questions that remain is the behavior of the system under arbitrarily time-varying channel conditions 
corresponding to more general fading models (not necessarily following a stationary ergodic process), and the case 
of imperfect SINR measurements and channel knowledge at the transmitter. We intend to explore these directions in 
future work. 


Appendix 
Technical Proofs 


A. Equilibrium analysis 


Proof of Theorem 1: We will first show that the game’s potential V is strictly concave under assumption (Al) 
(i.e., if n k is strictly increasing in each of its arguments). To that end, let Vo = log(<x“ + w s ) - no, V+ = - £* n k and 
differentiate V = Vo + V+ to obtain: 


dV _ dV 0 dV + _ dV 0 _ dn k 
dpks dpks + dpks dw s 8ks dpks ’ 


(25) 


and hence: 


d 2 V 


d 2 n k 


d 2 V 0 


dw s dvtv dp ks dp k . 


-S k ( - ~gksgis’A ss , - 6 k (B ss , 


dpks dpis' 


- gksgls' 


(26) 














where, in obvious notation: 
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A®, = — 


d 2 V 0 


and B* = 


3 2 7TA; 


(27) 


dw S ' ss dp ks dp ks ’ 

Since Vo is strictly concave in w (as the sum of a strictly concave function and a concave function), it follows that 
{A®,) is positive-definite. Accordingly, since A° ss , does not depend on k , any zero eigenvector z e of the KS x KS 
matrix g ks g( S 'A° ss , must satisfy: 

SksZ k s = 0 for all seS. (28) 


The degeneracy condition (28) reflects the fact that if w(p') = gksP ks = Yjk gksPks = w(p) for two power profiles 
p, p' e X, then Vo(p) = Vo(p'); Eq. (28) shows in addition that Vo admits no other directions along which it is constant. 
From this, it follows that the kernel Z of Hess(V) is at most S -dimensional; since arg max V lies in an affine subspace 
of R AS that is parallel to Z, we conclude that the Nash set of 0 is a convex polytope of dimension at most KS - S , as 
claimed. 

Assume now that p* is a Nash equilibrium of 0. If there exists a subcarrier s e S such that pi = 0 for all k e 3C, 
then any profile with p ks = P k for all k eOC cannot be Nash - and vice versa. Thus, without loss of generality (and after 
relabeling indices if necessary), we may assume that there exists a subcarrier s e S such that pi < p* (s for two users 
k, { 6 X. With this in mind, assume that every user-specific price function n k is increasing in each of its arguments and 
consider the tangent vector z € R AS with z k s = gcs, Ze s = ~gks, and z k 's’ = 0 otherwise. By (28), it follows that 


f(t) = V(p* + fz) 


(29) 


is constant for all sufficiently small t > 0 (note that p* +tz e X for small t > 0). However, by differentiating, we obtain: 


df 

dt 


ci dn 

= [Eo(p* + tz) - Zk' KkdPl, + tz ^\ = ~J~^ Zks 


diT[ 


dn ( 


dt 


~Zls — gks ", 


so we must have 


8ks 


dn e 

Opts 


dn k 

= gts T- 

p* +tz OPks 


0Pks vP£s vP£s 

for all sufficiently small t > 0 . 


dn k 
' dp ks ’ 


(30) 


(31) 


p*+fz 


With n k , 7if strictly increasing, this only holds if n k (resp. 717 ) is linear in p ks (resp. p ks ) and the channel gain coefficients 
g ks , gts have the required ratio. This last condition is a (Lebesgue) measure zero event, so our assertion follows. 

Otherwise, assume that (A2) holds, implying in particular that = (cr 2 + w s ) _l - maintains the same sign for 
all possible values of w s . Then, in view of the previous discussion, it suffices to prove uniqueness in the special case 
where the price functions n k are constant in a neighborhood of p*. In this case, the first order Karush-Kuhn-Tucker 
(KKT) conditions for (12) take the form: 


a) fsgks ~ Tt < 0, 

b) pks [r.,gi:. s - Xt] = 0, 


where A k is the Lagrange multiplier corresponding to the total power constraint p ks < P k and 

1 _ dn Q V ' 

cr 2 + iVj dw s ) 


(32a) 

(32b) 


1 


( 33 ) 
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Thus, with r s + 0 by assumption, we obtain: 

= — for all s, s' e supp(p?), (34) 

8ks’ r' s 

i.e., every user k e X is “load-balancing” the quantity gks/ r s over all employed subcarriers. 

By using a graph-theoretic method introduced in [30], we may deduce that the following hold except on a set of 
(Lebesgue) measure zero; indeed: 

1) No two users k, £ e OC can be using the same two subcarriers s, s' at equilibrium: if this were the case, we would 
have gks/gks' = g( s /ges', a measure zero event. 

2) There is at most 5-1 instances of users employing more than one subcarrier. Indeed, assume that user kj employs 
subcarriers Sj , s', with j = 1,..., N, N > 5 . Then, by the pigeonhole principle, there exists a subset of pairs (sj, s' ) 
that forms a cycle of length L > N in the graph with vertex set S. Hence, by relabeling indices if necessary, we 
obtain the cycle relation: 

8k\,s i gk 2 ,s 2 8k/ i.v/ i _ r s i r s 2 r SL- 1 _ j (35) 

8k 2 ,s 2 8h,s 2 gk L ,s L r s 2 r s 2 r SL 

where we have used the fact that si = s/,. This represents a measure zero condition, so our assertion follows. 

The above shows that p* lies in the interior of a face of X with dimension at most 5-1. Since the Nash set 
of 0 is a convex polytope of dimension KS - 5, we conclude that any Nash equilibrium lies at the intersection of 
a ^-independent (5 - l)-dimensional and a ^-dependent (KS - 5)-dimensional subspace of !R ks . However, since 
KS - 5 +5-1 < KS , the intersection of these subspaces is trivial on a set of full (Lebesgue) measure with respect to 
the choice of the ^-dependent subspace, implying that there exists a unique Nash equilibrium. ■ 


B. Convergence of exponential learning 

The basic idea of our convergence proof is as follows: we will first show that the iterates of Algorithm 1 track (in a 
certain sense that will be made precise below) the “mean-held” dynamics: 

jk = v*(p), 

expO+v) (36) 

Pks — . 

1 + EvsS exp(y^-) 

Theorem 2 will then follow by showing that the dynamics (36) converge to the maximum set of the game’s potential 
(and, hence, to Nash equilibrium) for any itial condition y(0). 

For simplicity, in the rest of this appendix (and unless explicitly stated otherwise), we will work with a single user 
with maximum transmit power P — 1; the general case is simply a matter of taking a direct sum over k e X and 
rescaling by the corresponding maximum power Pk of each user. With this in mind, let D = jp e R® : 0 < ZsPs < 1} 
denote the standard 5 -dimensional “corner-of-cube”, 6 and consider the entropy-like function: 

h(p) = ZL Ps Xogps+ ( ! - ZL Ps ) log _ 2 , ps ) ■ 


6 Recall that each user’s action space is a corner-of-cube. 


(37) 
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A key element of our proof will be the associated Bregman divergence [31, 32]: 

Dh( P*, P) = hip*) - h( p) - (v p /?|p* - p) = Z s Ps log - + 1 - 2 p; log (38) 

with the continuity convention 0 log 0 = 0. The Bregman divergence (38) resembles the well known Kullback-Leibler 
(KL) divergence in the same sense that h resembles the ordinary Gibbs-Shannon entropy: in particular, by exploiting 
the properties of the KL divergence, it is easy to see that Z)/,( p*, p) > 0 for all p*, p 6 CD, with equality if and only if 
p = p*; in this sense, Z)/ ,(p*, p) provides an oriented distance measure between p* and p in D. 

Employing the Bregman divergence, we can prove the following convergence result: 

Proposition 3. Every solution orbit p(f) of the dynamics (36) converges to Nash equilibrium in ©. 


Proof: Let p* be a Nash equilibrium of 6, and let H(t) = Di,ip\ p(f)). We then have: 

H = h( p*) + log (l + ~ Zs p * ys ’ 


(39) 


and hence: 


H = 




Z s p * y ° = Z ., Psys ~ Z.< p * j ' s = Z s (Ps ~ p *° )Vs = <p _ p * |v) ■ 


(40) 


l + 2 s G’ 

By concavity of V and the fact that v = V p « = V p L, it follows that (p - p*|v) > 0 with equality holding if and only if 
p is a maximizer of V (and, hence, a Nash equilibrium of 0). 

To show that pit) converges to a Nash equilibrium of 0, assume that p* is an w-limit of p(f), i.e., p (f„) —> p* for 
some increasing sequence t„ — > oo (that p(f) admits at least one w-limit follows from the fact that D is compact). This 
implies that H(t„) 0, and since H > 0, we also get lim,_ )0(1 H(t ) = 0, so p(f) —> p* by the definition of the Bregman 

divergence. ■ 

With this result at hand, we have: 

Proof of Theorem 2: We will first show that the basic recursion of Algorithm 1 comprises a stochastic approx¬ 
imation of the dynamics (36) in the sense of [26]. Indeed, it is easy to see that the exponential regularization map 
(16) is Lipschitz; moreover, since CD is compact and the game’s potential function is smooth on T>, it follows that the 
composite map y h-> v(p(y)) is also Lipschitz. As a result, by Propositions 4.2 and 4.1 of [26], we conclude that the 
recursion 

y(n + 1) = y(n) + y„v(p(n)), 

1 


p(« + 1) 


1 + 


(e 


,yi(H+l) 


.., e 




(XL) 


), 


is an asymptotic pseudotrajectory (APT) of the continuous-time dynamics (36). 

Now, let CD* denote the set of Nash equilibria of 0, and assume ad absurdum that pin) remains a bounded distance 
away from CD*. Furthermore, fix some p* e CD* and let D„ - Di,(jr. pin)): then, using (40), we obtain the Taylor 
expansion: 


D n+ 1 = D h ( p*, pin + 1)) = £>/,( p*, p(y(n) + y„v(p(«)))) 
< D„ - y„ <v(p(n))IP* - p(«)> + \Myl ||v(p(n))|| 2 , 


( 41 ) 
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for some constant M > 0 (that such a constant exists is a consequence of the fact that Hcss(/;) > ml for some 
m > 0 [33]). Since p (n) stays a bounded distance away from T>* (by assumption) and V is concave, we will also have 
(v(p(«))|p* - p in)) > 6 for some d > 0 and for all n. Hence, telescoping (41), we get: 

o„, < Do - 6 y, + 1421 

where we have set v = sup peI) ||v(p)||. Since , j]/ 7; ~* 0, this last inequality yields lim„_ )0O Z), I = -oo, 

a contradiction. We thus conclude that p(n) visits a compact neighborhood of T>* infinitely often, so our claim of 
convergence follows from [26, Theorem 6.10], ■ 


C. The fast-fading case 

Our goal in this appendix is to prove uniqueness of NE in the ergodic game <5 (Prop. 2) and the convergence of 
Algorithm 1 in the presence of fast fading. 

Proof of Proposition 2: That V is an exact potential for 0 follows directly by inspection, as in the case of 
Proposition 1. For the strict concavity of V, let HesSg(V) denote the Hessian of the static potential function V for 
a given realization of the channel gain coefficients g. Then, with V bounded and smooth over X, the dominated 
convergence theorem allows us to interchange differentiation and integration, so we obtain Hess(V) = E ? [HesSg(V)]. 
Thus, for all z e , we will have: 


z 1 • Hess(V) • z = Eg [z 1 • Hess g (V) • z] > 0. 


(43) 


From the proof of Theorem 1, we know that z 1 • HesSg(V) ■ z only if ft gksZks = 0 for all ,v e S; however, since this 
is a measure zero event (recall that the law of g is atom-free), we will have z 1 • Hess^(V) ■ z > 0 on a set of positive 
measure. This shows that z f • Hess(V) ■ z > 0 for all z e R A,V , i.e., V is strictly concave. We conclude that © admits a 
unique equilibrium, as claimed. ■ 

Proof of Theorem 3: The same reasoning as in the proof of Theorem 2 shows that the iterates of Algorithm 
1 run with the players’ instantaneous utilities calculated as in (21) comprise a stochastic approximation (asymptotic 
pseudotrajectory) of the mean dynamics: 


y* = v*(p). 


Pks ~ Pk 


exp (y ks ) 

1 + Sves expCffa')' 


(44) 


Again, by following the same steps as in the Proof of Theorem 2, we can show that the dynamics (44) converge to the 
unique Nash equilibrium of the ergodic game 0; as such, it suffices to show that any APT of (44) induced by Alg. 1 
converges to equilibrium. 

To that end, with notation as in (41), we readily obtain: 


Dn+I = D, t ( p*, pin + 1)) < D n - y n (v(n)|p* - p(«)> + \M y;, ||v(n)|| 2 , (45) 

where p* is the (unique) NE of © and M > 0 is a positive constant. Assume now that p(n) remains a bounded distance 
away from p* (so D n is bounded away from zero), and let f n = (\(n) - v(p(n))|p(n) - p*). Since V is (strictly) concave 
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and p(«) stays a bounded distance away from its maximum set, we will have (v(p(n))|p* - p(«)) < —m for some 
positive constant m > 0. Hence, telescoping (45) yields: 

D „+i < D 0 - t„ [m - ^ , = | ?j H f 0')ll 2 . (46) 

where t n = Y!' r \ 7j and w j,n = yj/tn. By the strong law of large numbers for martingale differences [34, Theorem 
2.18], we will have rC x YI)=i €j ~ > 0 (a.s.); hence, with y n +\ /y n < 1, Hardy’s weighted summability criterion [35, 
p. 58] applied to the weight sequence Wj A = yj/t n yields YIj=\ w j,n£j > 0 (a.s.). Finally, since y„ is square-summable 
and v(«) - v(p(n)) is a martingale difference with finite variance, it follows that y\ ||v(n)|| 2 < oo (a.s.) by Theorem 
6 in [36]. 

Combining all of the above, we obtain that the RHS of (42) tends to -oo (a.s.); this contradicts the fact that D„ > 0, 
so we conclude that p (n) visits a compact neighborhood of p* infinitely often. Since p* is a global attractor of (44), 
Theorem 6.10 in [26] shows that p(«) converges to p* (a.s.). ■ 
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